Performance of Phylogenetic Methods in Simulation
نویسنده
چکیده
?Computer simulations are useful because they can characterize the expected performance of phylogenetic methods under idealized conditions. However, simulation studies are also subject to several sources of bias that make the results of different simulation studies difficult to interpret and often contradictory. In this study, I examined the performance of 26 commonly used methods of phy? logenetic inference for three statistical criteria: consistency, efficiency, and robustness. Methods exam? ined included parsimony (general, weighted, and transversion), maximum likelihood (assuming Jukes-Cantor and Kimura models of DNA substitution), and UPGMA, minimum evolution, and weighted and unweighted least squares (with uncorrected, Jukes-Cantor, Kimura, modified Kimura, and gamma distances). The performance of methods was examined under three models of DNA sub? stitution for four taxa. The branch lengths of the four-taxon trees were varied extensively in this simu? lation. The results indicate that most methods perform well (i.e., estimate the correct tree ^95% of the time) over a large portion of the four-taxon parameter space. In general, maximum likelihood per? formed best, followed by the additive distance methods and the parsimony methods. Lake's method of invariants and UPGMA are, respectively, inefficient and extremely sensitive to branch-length in? equalities. In general, differential weighting of character-state transformations increases the perfor? mance of methods when the weighting can be applied appropriately. Although methods differ in their consistency, efficiency, and robustness, additional criteria?mainly falsifiability?are extremely im? portant considerations when choosing a method of phylogenetic inference. [Lake's invariants; maxi? mum likelihood; minimum evolution; neighbor joining; parsimony; phylogeny estimation; simula? tion; weighted least squares; unweighted least squares; UPGMA; tree space.] The accurate estimation of genealogical relationship remains a central problem of phylogenetics. In addressing this problem, systematists have devised a wide variety of methods to estimate phylogeny. In fact, the number of methods that can be used by the systematist is remarkably large; the number of potential methods available is easily over 100, if one counts the various combinations of distance methods and distance metrics as individual methods. Deciding which method to use for a particular phylogenetic problem, then, is a potentially time-consuming and confusing endeavor. The intent of simula? tion studies in general, and this study in particular, is to relieve some of this confu? sion by providing some indication of the per? formance of phylogenetic methods under idealized conditions. Simulations of data have limitations that are often obscured or overlooked. The main limitation of simulation studies is that, taken by themselves, they cannot indicate 1 E-mail: [email protected]. how methods will perform in the real world. Simulations necessarily make explicit as? sumptions about evolutionary process. The performance of methods is evaluated with respect to data generated under these evolu? tionary models. Unfortunately, these evolu? tionary models are often inadequate to de? scribe real data (e.g., DNA sequence data; Gojobori et al., 1982; Li et al., 1984; Wheeler and Honeycutt, 1988; Goldman, 1993; Yang et al., 1994). If the models used to generate the data with which different methods will be evaluated are inadequate descriptions of nature, what is the use of simulation studies in the first place? Simulations are useful because the phylo? genetic methods themselves make assump? tions about the evolutionary process. Al? though it may not be possible to simulate data under models representative of reality, it certainly is possible to simulate data under the conditions assumed by the method. It is possible, then, to examine the performance of methods under best-case conditions (i.e., when all the assumptions of the method are met). Simulation studies can
منابع مشابه
Iranian Railway System Performance Analysis Using Computer Simulation and Data Envelopment Analysis Methods
In this paper, a discrete event simulation model of Iranian railway system is used for analyzing performance of the system in freight sector. The model analysis shows more sensitivity of the system on loading and unloading facilities than on number of wagons and locomotives. System optimization through simulation optimization technique, shows that more than 4% of all wagons available in the n...
متن کاملMolecular identification of reovirus in broiler type flocks in Golestan province, Iran
Background: Avian reovirus (ARV) has a global distribution in nature and most clinical signs are found in broiler type chickens. Aims: This study was conducted to detect and identify reovirus infections from vaccinated breeder chickens and their progenies. Methods: A total of 20 tissue and blood samples were collected from vaccinated broiler br...
متن کاملThe Performance of Phylogenetic Methods on Trees of Bounded Diameter
We study the convergence rates of neighbor joining and several new phylogenetic reconstruction methods on families of trees of bounded diameter. Our study presents theoretically obtained convergence rates, as well as an empirical study based upon simulation of evolution on random birth-death trees. We find that the new phylogenetic methods offer an advantage over the neighbor joining method, ex...
متن کاملTowards the Development of Computational Tools for Evaluating Phylogenetic Network Reconstruction Methods
We report on a suite of algorithms and techniques that together provide a simulation flow for studying the topological accuracy of methods for reconstructing phylogenetic networks. We implemented those algorithms and techniques and used three phylogenetic reconstruction methods for a case study of our tools. We present the results of our experimental studies in analyzing the relative performanc...
متن کاملImproving the Performance of Bayesian Estimation Methods in Estimations of Shift Point and Comparison with MLE Approach
A Bayesian analysis is used to detect a change-point in a sequence of independent random variables from exponential distributions. In This paper, we try to estimate change point which occurs in any sequence of independent exponential observations. The Bayes estimators are derived for change point, the rate of exponential distribution before shift and the rate of exponential distribution after s...
متن کاملEvaluation and comparison of performance of SDSM and CLIMGEN models in simulation of climatic variables in Qazvin plain
Climate change is found to be the most important global issue in the 21st century, so to monitor its trend is of great importance. Atmospheric General Circulation Models because of their large scale computational grid are not able to predict climatic parameters on a point scale, so small scale methods should be adapted. Among downscaling methods, statistical methods are used as they are easy to...
متن کامل